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DESCRIPTION 
VISUAL RECOGNITION METHOD 

Technical Field 

This invention relates to a method for identifying the 
location and orientation of a known article within a visual 
field. 

Background Art 

U.S. Pat. No. 5,379,353 suggests a differential 
analysis circuit that utilizes a step to identify edge 
vectors for identification of such things as a road for a 
mobile robot. A digital image captured from a video camera 
is processed using an algorithm that includes generation of 
a differential of brightness along each row of pixels, and 
presumably also along each column of pixels. The absolute 
value of the differential brightness represents a change in 
the picture, and a differential that exceeds a threshold is 
identified as a possible edge to a road. 

U.S. Pat. No. 5,381,155 suggests a speed detection 
system that identifies moving vehicles in the view of a 
fixed camera, measures the speed at which the vehicles are 
moving, and identifies a license plate n umb er from the 
vehicle. Commercially available systems are disclosed that 
are said to be capable of identifying the license plate 
within a captured image and then reads the numbers and 
letters within the license plate number. 

U.S. Pat. No. 5,381,489 suggests a system for 
recognition of characters on a medium. This system includes 
making a window of a possible character from the medium, and 
then comparing that window to each template within a set. 
The entire set of templates must be screened each time a 
character is identified. The templates are generated based 
on previously recognized characters from the document where 
the initial recognition requires a more rigorous comparison 
to different character features. 

A problem faced in visual recognition is to recognize 
the location, within the view of a camera, and the 
orientation, of a particular article where the article may 
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be one of a relatively few possible articles already 
identified from a library of potential articles. The 
possibility of a variety of lighting conditions and shadows 
make such recognition difficult. There are also typically 
constraints on the amount of computer data storage available 
at the site of a desired visual recognition facility. 
Therefore, templates of different orientations an scales of 
the different articles can generally not be generated and 
stored initially. 

Such a problem in visual recognition is encountered 
when visual recognition is used as a means to identify 
vehicles or determine the orientation of vehicles in an 
automated refuelling system. For example, in U.S. Pat. No. 
3 , 527 , 268 it is suggested that vehicle identification in an 
automated refuelling system can be achieved in a fully 
automated method by a photo-electric means to detect the 
silhouette of the automobile. How this is to be done is not 
suggested. 

It is therefore an object of the present invention to 
provide a method to identifying the location and orientation 
of an article, wherein the method is capable of identifying 
the location and orientation of the article in a variety of 
natural and artificial lighting conditions , and wherein a 
large number of templates do not have to be digitally 
stored . 

Disclosure of the Invention 

These and other objects of the invention are 
accomplished by a method to identify the location and 
orientation of an article, the method comprising the steps 
of: obtaining an image of the article with the article 
having a known orientation and location relative to a 
camera ; creating a X and Y template edge matrix from the 
image of the article; creating a plurality of sets of 
modified template edge matrices, each of the sets of 
modified template edge matrices being a X and Y template 
edge matrix with the article in a different orientation; 
capturing an digital visual image containing the article, 

2 
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the digital image being a matrix of pixels; creating X and 
Y article edge matrices from the matrix of pixels; 
quantifying difference between each of the sets of modified 
template edge matrices and the X and Y article edge matrices 
with the modified template edge matrices placed at a 
plurality of locations within the bounds of the article edge 
matrices; and identifying the location and orientation of 
the article as the orientation of the article represented 
by the set of modified template edge matrices at the 
location within the bounds of the X and Y article edge 
matrices with the minimal quantified differences between the 
modified template edge matrices and the X and Y article edge 
matrices. 

This method can be readily adapted to identification 
of a location and orientation of a vehicle within a bay for 
automated refuelling purposes. The make and model of the 
vehicle can be identified by another means, such as for 
example, driver manual input, a magnetic or optical strip, 
or a passive or active transponder located on the vehicle. 
With the make and model identified, or limited to one of a 
small number of possibilities (such as when more than one 
transponder signal is being received) , base templates can 
be retrieved from storage. The base templates can be 
prepared from a digital visual image of the known make and 
model of vehicle with the vehicle positioned at a known 
location with respect to the camera, and the image processed 
by generation of X and Y edge matrices. After the make and 
model of the vehicle are identified, a series of modified 
templates are created from the retrieved templates by 
rotation of the template edge matrices to different angles 
from the initial orientation and/ or scaling the matrices to 
represent different distances from the camera. Thus, only 
one set of base templates (or one set of X and Y edge matrix 
templates) needs to be stored in the data base for each 
vehicle. A captured visual image containing the vehicle 
within the refuelling facility can then be processed to 
generate article edge matrices, and compared to the modified 

3 
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templates, with each modified template being compared to the 
article edge matrices at different locations within the 
article edge matrices. 

Preferably, a mask of the template is prepared so that 
5 only the outlines and/or internal edges of the article, and 
not the surrounding area, is compared to the actual article 
edge matrices. The mask also provides expected dimensions 
of the article so that only locations within the article 
edge matrices within which the article would fit would be 

10 searched for the article, and the article can be identified 
with a position relatively close to the edge of the view. 

Separately comparing the X and Y edge template matrices 
with the article X and Y edge matrices significantly 
improves the robustness of the method, and results in 

15 reliable fits being found quickly in a variety of light 
conditions, with partial obstruction of the view of the 
article, and with partial masking by dirt, leaves, grass, 
and other articles that may be present in a relatively 
uncontrolled environment. 

20 Detailed Description of the Invention 

A camera is typically used in the practice of the 
present invention to capture a visual image of an article 
in a known position and orientation. A digital image can 
be captured using one of the commercially available 

25 framegrabber hardware and associated software packages. The 
digital image is a matrix of pixels, each of the pixels 
having a number that corresponds linearly to a brightness. 
A color image can be utilized, in which case the image is 
represented by three matrices, one each for red, green and 

30 blue. Typically, the images of about 256 by 240 pixels are 
preferred for the practice of the present invention because 
such a number of pixels results in sufficient resolution and 
is within the capacity of relatively inexpensive video 
cameras. The video camera may generate an image of about 

35 twice the resolution of a 256 by 240 matrix, in which case 
the image can be reduced by averaging adjacent pixels to 
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create a matrix of pixels having one half the height and one 
half the width by averaging blocks of four pixels. 

Signal to noise ratios can be increased by averaging 
two or more consecutive images. 
5 Edge matrices may be generated from both the images 

containing the article in a known location and orientation 
relative to the camera, and the images containing the 
article within which the location and orientation of the 
articles are to be determined r by applying operators such 
10 as the following: 

JL 
8 

and; 



1 0 -1 

2 0-2 
1 0 "1. 

As can be seen from these operators , they each result 
in a matrix in which the elements will sum to zero. The 
absolute values of the elements of the resulting matrices 

15 indicate the change in brightness along the x and y axis 
respectively. The use of edge magnitudes helps make the 
appearance invariant to direction of light and color of the 
object. Producing these edge matrices therefore results in 
images that can be compared with templates inspite of 

20 significant differences in lighting or color of the article 
(although color could be identified as well in the practice 
of the present invention) . The results of these two 
operators can be summed to obtain one edge vector matrix, 
but in the practice of the present invention, it is 

25 significant that the two are not combined for comparison. 

Not combining the two greatly increases the robustness of 
the algorithm, i.e. the ability of the algorithm to identify 
outlines when the articles are masked with dirt, partially 
obscured, or subjected to varying light conditions. 

5 
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A mask is preferably place over the image of the edge 
matrices of the article with the know location and 
orientation so that only the know outlines of the article 
are considered. The masked regions can be referred to as 
5 "don't care" regions, because edge data in these regions 
will be ignored when fitting the edge matrices of the 
article in the known position to the edge matrices of the 
article within which the location and orientation of the 
article is to be identified. 

10 The dimensions of the mask can also define limits of 

the locations within the image containing the article that 
could contain an image of the article. For example, if the 
mask were fifty pixels by fifty pixels within an image of 
256 pixels by 240 pixels, then only locations within the 

15 middle 2 06 by 190 pixels could be the center of the mask, 
if the entire article is within the image. 

When a color image is used, edge matrices are 
preferably generated for each color, and then the three edge 
matrices are preferably combined to form one X or Y edge 

2 0 matrix. This combination can be by summing the absolute 
values of the three edge matrices (and dividing the sum by 
three) , by selecting the maximum value of the edge matrix 
among the three, by calculating an average, or by taking the 
square root of the sum of the squares of corresponding 

25 elements of the edge matrices. It may also be possible to 
consider two of the three colors in, for example, one of the 
preceding ways. The use of a color image improves the fit 
to a template by providing that more edge information is 
extracted from an image. For example, an edge image at an 

30 interface between colors can be identified even if the 
interface is between surfaces having similar brightness. 
Use of color images increases the cost of the camera, and 
the increases amount of data processing required to take 
advantage of having three sets of images, but is preferred 

35 if the difficulty of the application warrants the additional 
expense . 
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The image of the article may be reduced by, for example 
averaging adjacent pixels. An image is therefore created 
that contains fewer pixels for comparison of the modified 
templates to the article edge matrices for finding an 
initial position and orientation estimate. For example, a 
matrix of 256 by 24 0 pixels could be reduced to a matrix of 
64 by 60 pixels by two successive averaging and subsampling 
operations. Comparisons of the reduced matrices can be 
accomplished much more quickly, and then the fit of the 
reduced matrices can be used as a starting point for finding 
a fit for the larger matrices. Generally, only locations 
within a few (one to three) pixels of the pixels averaged 
into the best fit result of the reduced matrix need to be 
compared at higher levels of resolution. 

Reducing the matrices can significantly reduce 
computing time required to compare the template edge 
matrices with the article edge matrices. Two reductions, 
each being two for one linear reductions, are recommended. 
Thus, each reduction therefore reduces the amount of 
information to be considered by a factor of four. The 
combined reductions reduce the amount of information by a 
factor of sixteen. Further, each of the parameters for 
which templates are prepared are of lower resolution, 
resulting in fewer sets of rotated templates, and at fewer 
locations within the view of the article edge matrix. 
Initial searches within the reduced matrixes can therefore 
be performed in two or three orders of magnitude less time 
than if the article edge matrix was searched at the original 
level. 

When the article which the image is to be searched for 
is know, a template of that article, preferably as two edge 
matrices with a mask, can be selected from a data base. The 
template can then be modified to represent the article in 
a plurality of orientations. By orientations, it is meant 
that the two dimensional image of the article is rotated 
about the axis of the view of the camera, rotated to an 
angled view of the object, and/or scaled to represent 
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changes in distance from "the camera. Increments of, for 
example, two to three degrees of rotation can be used to 
obtain a sufficient number of orientations that one should 
have a clearly best fit for the article in an particular 
5 orientation. 

For an application such as an overhead camera 
identifying a vehicle's position within a bay of an 
automated refuelling system, an expected orientation can be 
predicted (most drivers drive in relatively straight) , and 

10 it can be also predicted that the actual orientation will 
not be more than a certain variation (for example, plus or 
minus twenty degrees) from the expected orientation. Thus, 
only a limited number of modified template edge matrices 
need to be created. But creating these modified templates 

15 after the vehicle make and model have been identified 
considerably reduces the amount of computer storage needed 
to store template matrices . 

Rotation of the template matrices about an axis 
essentially normal to a plane of the two dimensional view 

20 of the video camera (or "transforming" the image to the new 
orientation) is readily accomplished by well known methods. 
Such transformations are preferably performed by calculating 
the point within the original matrix of pixels each pixel 
within the transformed matrix would lie so that the four 

25 pixels of the original matrix surrounding the center of the 
pixel from the transformed matrix can be used to interpolate 
a value for the pixel of the transformed matrix. Again, 
methods to accomplish these interpolations are well known. 
The templates could also be created with the image 

3 0 scaled to represent the article located at different 
distances from the camera. This scaling is accomplished by 
changing the dimensions from a center point of the camera 
view inversely proportional with the distance from the 
camera. This scaling is preferably performed based on 

35 distances from an effective' pinhole, where the effective 
pinhole is defined as a point through which a perspective 
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projection is made by the camera. This effective pinhole 
would therefore be slightly behind the lens of the camera. 

A more difficult problem is to identify a location and 
orientation of a known article or outline when viewed at an 
5 angle significantly different from normal to a plane 
containing the article or outline. For example, a camera 
located on a refuelling apparatus may need to locate a 
gasoline nozzle cover lid from an position that does not 
allow viewing of the cover lid with the camera facing 

10 perpendicular to the plane of the cover lid. A rectangular 
lid cover would therefore not appear to the camera to be 
rectangular. The distortion from a rectangular shape would 
depend upon both the angle and the relative position of the 
lid with respect to the center line of the camera's view 

15 (known as the optical axis) . 

Geometric distortion can be eliminated from images that 
are not normal to the optical axis if the article of the 
image can be approximated by a planar image. If the angle 
of the optical axis from perpendicular of the planar image 

20 of the article is known (i.e., the image to be searched for 
the article) , geometric distortion can be removed, and 
images obtained that represent a transformation to 
perpendicular views of the article in the image to be 
searched. Likewise, if the templates are created wherein 

25 the optical axis is not perpendicular to the plane of the 
template, geometric distortion can be removed from the 
templates by such a transformation. If the angle from 
normal to the optical axis is not known for the image to be 
searched, this angle can be another search parameter. 

30 Such transformation to a perpendicular view is 

simplified by the fact that the transformation is the same 
for a given angle between the optical axis and the normal 
of the plane approximately containing the article, 
regardless of the displacement between the camera and the 

35 plane, provided that the target remains in the limits of the 
view of the camera. 
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The preferred transformation method in the practice of 
the present invention, rather than to place a pixel from an 
article image matrix within a transformed image , will take 
a pixel location from the transformed matrix and calculate 
5 the location of that pixel within the article image matrix. 
An interpolation is then performed using four pixel values 
of the article image matrix surrounding the position of the 
inversely transformed matrix pixel to determine the value 
of the pixel in the transformed image. The following 
10 equations provide the location of a pixel from the 
transformed image on the article image , for the case of the 
article plane normal being perpendicular to the image X 
axis : 



and: 




15 where: 

a=sin(6) (5) 

and: 

£>=cos<e> (6) 

and: p x is the ratio of actual article image plane x 
position to P Q/ 

p y is the ratio of actual article image plane y 
20 position to P Q , 

P Q is the perpendicular distance from the effective 
pinhole to the actual article, 

10 
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P Q ; is the distance from the plane of the transformed 
image to the effective pinhole, 

P tt is the vertical displacement of the camera of the 
transformed image relative to the camera position in the 
actua 1 image , 

cr y is the y coordinate value in the transformed image, 
cr x is the w coordinate value in the transformed image, 



0 is the downward pitch angle of the plane normal to 
the camera. 

For $ of up to about fifty degrees , the following 
ratios can be used to fit a good portion of the original 



where p yl is half of the vertical height of the original 
image • 

Although modified templates can be created with 
rotations and changes in distances from the camera, a 
plurality of such rotations and changes could result in an 
exceedingly large number of modified templates. It is 
therefore preferred that searches are carried out over one 
variable out of the possible rotating, scaling, and angled 
views in the practice of the present invention. 

If the orientation of the article with respect to 
rotation within a plane perpendicular to the camera view is 
expected to be within about twenty degrees of the 
orientation of the article having the known orientation, the 
template X and Y edge images may be simply individually 
rotated to form the modified edge template images prior to 
comparing the modified template edge images to the article 
edge images. When more than about twenty degrees of 



and 



image into the transformed image: 




(7) 



and 




<8) 
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rotation is possible , a new set of edge images is preferably 
created based on a combination of the original edge images. 
The X and the Y edge image values together represent an edge 
vector having an angle (arctan(Y/X) ) and a magnitude 
5 ( (X 2 +Y 2 ) m ) . This angle may be rotated by the angle of 
rotation of the template and new X and Y components 
calculated. Typically, only the absolute values of the X 
and Y components are stored, and therefore edge vectors in 
the first or third quadrant must be differentiated from edge 

10 vectors of the second or fourth quadrant. Edge vectors in 
the third and fourth quadrants could be considered as their 
negative vectors in the first and second quadrants 
respectively, and therefore just two quadrants of vectors 
need be identified. Quadrants of edge vectors can be 

15 identified with a single additional binary template 
generated from the original template image, the binary 
template having pixels representing whether the edge 
magnitude vector at that point represents an edge whose 
direction vector is in the first or third quadrant, or the 

20 second or fourth quadrant. This template can be 
automatically generated from the template image. This 
requires very little additional storage space, and can be 
used during a rotation operation to adjust the X and Y edge 
magnitude weights to their exact proper proportion at very 

25 little extra computational cost. Rotation of the edge 
matrices by any amount of rotation can thereby be made 
completely valid. 

The following equation is convenient for the purpose 
of quantifying the differences between the modified template 

30 edge matrices and the article edge matrices because 
commercially available image processing cards are available 
to quickly generate the comparisons: 



12 
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^2 X ij^ y (x*i) (y+j) + 52 (x+i) (y*j) 

p ix.y) = „ ip _ & 

(9) 

where X is a X template edge matrix of i by j pixels rotated 
to an orientation to be tested against a portion of the 
image matrix, 

Y is a Y template edge matrix of i by j pixels rotated 
5 to an orientation to be tested against a portion of the 
image matrix , 

X ; is a portion of an image X edge matrix of i by j 
pixels located at a position of coordinates x,y on the X 
image edge vector matrix, 
10 Y 7 is a portion of an image Y edge matrix of i by j 

pixels located at a position of coordinates x f y on the Y 
image edge vector matrix, and 

P(x,y) is a grey scale edge correlation normalized for 
point (x,y) . 

15 The grey scale edge correlation will be a number 

between zero and one, with one being a perfect match. Grey 
scale correlations are performed for each x and y within the 
article edge matrix for which the entire modified template 
edge matrix can fit within the article edge matrix- The 

2 0 resulting grey scale correlation that is the closest to 
approach unity is the closest fit. Interpolation between 
variables can be achieved using linear or squared weighing 
above a noise threshold* Such variables may be, for 
example, angle of rotation, or x and y locations. 

25 Portions of the calculations to generate these grey 

scale edge matrices can be quickly made using a GPB-1 
auxiliary card-AlignCard. 

"Don't care" regions may also fall within the 
boundaries of the i by j dimensioned matrices of the 

30 modified template edge matrix. Pixels in the template 
identified as "don't care" are preferably not used in the 
summations of the terms of Equation 9. 

13 
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Because the grey scale edge matrix correlation result 
is very sensitive to relative displacement of an objects 
template and test image, a smoothing operation may be 
performed prior to comparison of the two. Although reducing 
5 the matrices as described above has a smoothing effect, a 
further smoothing operation may also be included. This 
smoothing operation may be performed on each before the 
correlation is calculated, but after the subsampling to a 
current search level. A preferred smoothing operation is 
10 a Gaussian approximation, given by the following convolution 
kernel: 



0 10 
14 1 
0 10 



(10) 



When this smoothing is applied, it is preferably applied to 
both the article edge matrix and the modified template edge 
matrix. 

15 A preferred application of the method of the present 

invention is an automated refuelling methods disclosed in 
U.S. Pat. Appl. Nos. 461,280 (Docket No. TH0622) , 461,281 
(Docket No. TH0572) , and 461,276 (Docket No. TH0573 , the 
disclosures of which are incorporated herein by reference. 

2 0 The embodiments described above are exemplary, and 

reference is made to the following claims to determine the 
scope of the present invention. 
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CIAIMS 

1. A method to identify the location and orientation of 
an article, the method comprising the steps of: 

obtaining an image of the article with the article 
having a known orientation and location relative to a 
camera ; 

creating a X and Y template edge matrix from the image 
of the article; 

creating a plurality of sets of modified template edge 
matrices, each of the sets of modified template edge 
matrices being a X and Y template edge matrix with the 
article in a different orientation; 

capturing an digital visual image containing the 
article, the digital image being a matrix of pixels; 

creating X and Y article edge matrices from the matrix 
of pixels; 

quantifying difference between each of the sets of 
modified template edge matrices and the X and Y article edge 
matrices with the modified template edge matrices placed at 
a plurality of locations within the bounds of the article 
edge matrices; and 

identifying the location and orientation of the article 
as the orientation of the article represented by the set of 
modified template edge matrices at the location within the 
bounds of the X and Y article edge matrices with the minimal 
quantified differences between the modified template edge 
matrices and the X and Y article edge matrices. 

2. The method of Claim 1 further comprising the steps of 
smoothing the modified template edge matrices and the 

article edge matrices by averaging adjacent pixels and 
subsampling to obtain reduced matrices with a reduced number 
of pixels; 

quantifying the difference between each of the reduced 
modified template edge matrices and the reduced article edge 
matrices with the modified template edge matrices placed at 
a plurality of locations within the bounds of the article 
edge matrices; and 

15 
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10 



quantifying the difference between each of the sets of 
modified template edge matrices and the X and Y article edge 
matrices with the modified template edge matrices placed at 
a plurality of locations within the bounds of the article 
edge matrices only for locations near the location and the 
orientation having minimum differences between the reduced 
matrices . 

3. The method of Claim 1 wherein the edge matrices are 
obtained by applying to the image to obtain horizontal and 
vertical edge matrices respectively the operators: 



1 2 1 
0 0 0 
-1 -2 -1 



and; 



10-1 
2 0-2 
10-1 



4. The method of Claim 1 wherein a plurality of modified 
template edge matrices are created with the template edge 
matrix scaled to represent different distances from the 

15 earner a . 

5. The method of Claim 1 wherein a plurality of modified 
template edge matrices are created with the template edge 
matrix transformed to represent different angles from normal 
to the optical axis. 

20 6, The method of Claim 1 wherein portions of the template 
edge matrices outside of outlines of the article are ignored 
when quantifying the difference between each of the modified 
template edge matrices and the article edge matrices. 

7. The method of Claim 1 wherein the plurality of 
25 locations is every contiguous set of pixels within the 

article edge matrices within which a modified template edge 
matrices will fit. 

8. The method of Claim 2 wherein the plurality of 
locations for which the difference between each of the 
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reduced modified template edge matrices and the reduced 
article edge matrices are quantified include every 
contiguous set of pixels within the reduced modified 
template edge matrices will fit. 

9. The method of Claim 1 wherein a color image is 
obtained, and single color edge matrices are created for 
more than one color, and then combined to obtain the 
template edge matrices. 

10. The method of Claim 9 wherein three set of single color 
edge matrices are created. 

11. The method of Claim 2 wherein edge matrices are 
obtained by applying to the image to obtain vertical and 
horizontal edge matrices respectively the operators: 



12 1 
0 0 0 
-1 -2 -1 



and; 



10-1 
2 0-2 
10-1 



15 



20 



25 



12. The method of Claim 11 wherein portions of the template 
edge matrices outside of outlines of the article are ignored 
when quantifying the difference between each of the modified 
template edge matrices and the article edge matrices. 

13. The method of Claim 12 wherein the plurality of 
locations comprises every contiguous set of pixels within 
the matrix of pixels within which a modified template edge 
matrices will fit. 

14. The method of Claim 12 wherein the plurality of 
locations for which the difference between each of the 
reduced modified template edge matrices and the reduced 
article edge matrices are quantified comprise every 
contiguous set of pixels within the reduced modified 
template edge matrices will fit. 
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15. The method of Claim 14 wherein a color image is 
obtained, and single color edge matrices are created for 
more than one color, and then combined to obtain the 
template edge matrix. 
5 16. The method of Claim 15 wherein three sets of single 
color edge matrices are created and then combined to obtain 
the template edge matrix. 

17. A method to identify the location and orientation of 
an article, the method comprising the steps of: 
10 obtaining an image of the article with the article 

having a known orientation and location and distance from 
a camera; 

creating a X and Y template edge matrix from the image 
of the article; 

15 creating a plurality of sets of modified template edge 

matrices, each of the sets of modified template edge 
matrices being a X and Y template edge vector matrix with 
the article in a different orientation; 

capturing an digital visual image containing the 
20 article, the digital image being a matrix of pixels; 

creating X and Y article edge matrices from the matrix 
of pixels; 

smoothing the modified template edge matrices and the 
article edge matrices by averaging adjacent pixels and 
25 subsampling to obtain reduced matrices with a reduced number 
of pixels; 

quantifying the difference between each of the reduced 
modified template edge matrices and the reduced article edge 
matrices with the reduced modified template edge matrices 

30 placed at a plurality of locations within the bounds of the 
article edge matrices; and 

quantifying the difference between each of the sets of 
modified template edge matrices and the article matrices 
with the modified template edge matrices placed at a 

35 plurality of locations within the bounds of the article edge 
matrices only for locations near the location and the 
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orientation having minimum differences between the reduced 
matrices; and 

identifying the location and orientation of the article 
as the orientation of the article represented by the set of 
5 modified template edge matrices at the location within the 
bounds of the X and Y article edge matrices with the minimal 
quantified differences between the modified template edge 
matrices and the X and Y article edge matrices, 

wherein portions of the template edge matrices outside 
10 of outlines of the article are ignored when quantifying the 
difference between each of the modified template edge 
matrices and the article edge matrices. 

18. The method of Claim 17 wherein the plurality of 
locations for which the difference between each of the 
15 reduced modified template edge matrices and the reduced 
article edge matrices are quantified include every 
contiguous set of pixels within the reduced modified 
template edge matrices will fit. 
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of directions by a camera and stored. Also, the relative 
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camera at the respective image capturing are stored. 
An image of a pile of workpieces is captured by the cam- 
era to obtain a two-dimensional image and the position/ 
posture of the camera at the image capturing is stored. 



An image of a workpiece matched with one reference 
model is selected by matching processing of the refer- 
ence model with the captured image. A three-dimen- 
sional position/posture of the workpiece with respect to 
the camera is obtained from the image of the selected 
workpiece, the selected reference model and position/ 
posture information associated with the reference mod- 
el. A picking-up operation for picking out a respective 
workpiece from a randomly arranged pile can be per- 
formed by a robot, based on the position/posture of the 
workpiece. 
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Description 

[0001] The present invention relates to an image 
processing apparatus for detecting three-dimensional 
position and posture (orientation) of an object, and in 
particular to an image processing apparatus suitable for 
use in a bin-picking operation for taking out a workpiece 
one by one from a pile of workpieces using an industrial 
machine such as a robot. 

[0002] The operation of taking out an individual work- 
place from a randomly arranged pile of workpieces or 
an aggregation of workpieces contained in a container 
of a predetermined size, which workpieces have identi- 
cal shapes and different three-dimensional positions/ 
postures, has been performed manually. In storing 
workpieces in a pallet or placing workpieces at a prede- 
termined position in a machine or a device using a (ded- 
icated) robot, since it has been impossible to directly 
take out individual workpieces one by one from the ran- 
domly arranged pile of workpieces by the dedicated ro- 
bot, it has been necessary to rearrange the workpieces 
in advance so as to be picked out by the robot. In this 
rearrangement operation, it has been necessary to take 
out an individual workpiece from the pile manually. 
[0003] The reason why individual workpieces having 
identical shapes and different three-dimensional posi- 
tions/postures cannot be picked out by a robot from a 
randomly arranged pile of workpieces or an aggregation 
of workpieces contained in a container is that the posi- 
tion/posture of individual workplaces in the pile or the 
aggregation cannot be recognised, so that a robot hand 
cannot be placed to a suitable position/posture at which 
the robot hand can hold the individual workpiece. 
[0004] An object of the present invention is to provide 
an image processing apparatus capable of detecting 
three-dimensional position and posture of individual ob- 
jects in a randomly arranged pile or an aggregation in a 
container of a predetermined region, which have iden- 
tical shapes and different three-dimensional positions/ 
postures. 

[0005] An image processing apparatus of the present 
invention comprises an image capturing device; and a 
memory storage reference model based on image data 
of a reference object captured by the image capturing 
device in a plurality of directions, and storing information 
of the capturing directions to be respectively associated 
with the reference models. The reference object may be 
an object for detection itself or an object having a shape 
identical to that of the object for detection. 
[0006] The image processing apparatus also com- 
prises a processor for performing matching processing 
of image data containing an image of the object for de- 
tection captured by the image capturing device with im- 
age data for reference modeis to select an image of an 
object matched with one of the reference models, and 
to obtain posture, or posture and position, of the object 
based on the selected image of the object, said one ref- 
erence model and the information of the direction asso- 



ciated with said one reference model. 
[0007] The reference models may be a part of the im- 
age data of the reference object or obtained by process- 
ing the image data of the reference object. 

5 [0008] The image capturing device may be a camera 
for capturing two-dimensional image data, and in this 
case the image data of the reference object are captured 
by the image capturing device from a predetermined dis- 
tance. Alternatively, the image capturing device may be 

10 a visual sensor for capturing three-dimensional image 
data, and when the three-dimensional visual sensor is 
adopted the image data containing an image of the ob- 
ject of detection may be two-dimensional arrangement 
data including distance information from the object of 

is detection to the image capturing device, a part of said 
two dimensional arrangement data or a set of distance 
data. 

[0009] The image capturing device may be attached 
to a wrist of a robot. Further, the image data of the ref- 
20 erence object can be captured in a place different from 
a place where the detection of the object is performed, 
and supplied to the image processing apparatus on line 
or off line. 

[0010] For a better understanding of the invention, 
and to show how the same may be carried into effect, 
reference will now be made, by way of example, to the 
accompanying drawings, in which:- 

FIG. 1 is a diagram for showing a picking-up oper- 
ation by a robot to take out an individual workplace 
from a pile of workpieces using an image process- 
ing apparatus according to an embodiment of the 
present invention; 

FIGS. 2a-2d show examples of reference models; 
FIG . 3 is a block diagram of a principal part of a robot 
controller; 

FIG. 4 is a block diagram of the image processing 
apparatus according to an embodiment of the 
present invention; 

FIG. 5 is a flowchart of the processing for creating 
reference models; 

FIG. 6 is a flowchart of the processing for the pick- 
ing-up operation; 

FIG. 7 is a diagram showing an example of scanning 
motion of a visual sensor capable of obtaining dis- 
tance data; 

FIG. 8 is a diagram of the two-dimensional arrange- 
ment data containing distance data as image data 
obtained by the visual sensor; 
FIG. 9 is a flowchart of processing for obtaining the 
two-dimensional arrangement data. 

[0011] An embodiment in which an image processing 
apparatus of the present invention is used in combina- 
tion with a robot system will be described. In this em- 
bodiment, an image of a pile of workpieces, which are 
objects for detection having identical shapes and ran- 
domly arranged as shown in FIG. 1, is captured by an 
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image capturing device (camera or visual sensor) 20, 
which is attached to a wrist of a robot RB and position 
and posture (orientation) of the individual workplaces 
are detected based on the captured image. For this pur- 
pose, images of a reference object, which is one of work- 
places W, subjected to a picking-up operation or an ob- 
ject having a shape identical to that of the workpiece W 
are captured in different directions by the image captur- 
ing device and reference models are created from the 
image data obtained by the image capturing and stored 
in advance. Matching processing between the image 
data obtained by capturing the image of the pile of work- 
places and the reference models is executed to select 
an image of one workpiece matched with one of refer- 
ence models, and a position/posture of the selected 
workpiece is determined based on the selected image 
of the workpiece in the image field of view : the selected 
one of taught modes and the position/posture informa- 
tion being associated with the selected one of the refer- 
ence models. 

[0012] FIG. 3 is a block diagram showing a principal 
part of a robot controller 10 for use in the embodiment 
of the present invention. A main processor 1 , a memory 
2 including a RAM, a ROM and a nonvolatile memory 
(such as an EEPROM), an interface 3 for a teaching op- 
erating panel, an interface 6 for external devices, an in- 
terface 7 for an image processing apparatus and a servo 
control section 5 are connected to a bus 8. A teaching 
operating panel 4 is connected to the interface 3 for a 
teaching operating panel. , 

[0013] A system program for supporting basic func- 
tions of the robot RB and robot controller 10 are stored 
in the ROM of the memory 2. Robot operation programs 
and their related determined data which are taught in 
accordance with various operations are stored in the 
nonvolatile memory of the memory 2. The RAM of the 
memory 2 is used for temporary storage of data for var- 
ious arithmetic operations performed by the processor 
1. 

[0014] The servo control section 5 comprises servo 
controllers Sal to San (n:sum of the number of all the 
axes of the robot including additional movable axes of 
a tool attached to a wrist of the robot), each composed 
of a processor, a ROM, a RAM, etc. Each servo control- 
ler performs position/velocity loop control and also cur- 
rent loop control for its associated servomotor for driving 
the axis, to function as a so-called digital servo controller 
for performing loop control of position, velocity and cur- 
rent by software. Each servomotor M1 -Mn for driving 
each axis has its drive controlled according to outputs 
of the associated servo controller 5al-5an through the 
associated servo amplifier 5bl-5bn. Though not shown 
in FIG. 3, a position/velocity detector is attached to each 
servomotor Ml-Mn, and the position and velocity of each 
servomotor detected by the associated position/velocity 
detector is fed back to the associated servo controller 
5al-5an. Connected to the input/output interface 6 are 
sensors of the robot, and actuators and sensors of pe- 



ripheral devices. 

[0015] FIG 4 is a block diagram of the image process- 
ing apparatus 30 connected to an interface 7 of the robot 
controller 10. The image processing apparatus 30 com- 

5 prises a processor 31 to which a ROM 32 for storing a 
system program to be executed by the processor 31 , an 
image processor 33, an image -capturing- device inter- 
face 34 connected to the image capturing device 20, a 
MDI 35 with a display such as a CRT or a liquid crystal 

10 display for inputting and outputting various commands 
and data, a frame memory 36, a nonvolatile memory 37, 
a RAM 38 for temporary storage of data and a commu- 
nication interface 39 for the robot controller are connect- 
ed. An image captured by the camera 20 is stored in the 

*5 frame memory 36. The image processor 33 performs 
image processing from images stored in the frame 
memory 36 on demand of the processor 31 so as to rec- 
ognise an object. The architecture and function of the 
image processing apparatus 30 itself is no way different 

20 form the conventional image processing apparatus, The 
image processing apparatus 30 of the present invention 
is different form the conventional one in that reference 
models as described later are stored in the nonvolatile 
memory 37 and pattern matching processing is per- 

2S formed on an image of a pile of workpieces W captured 
by the image capturing device 20 using the reference 
models to obtain the position and posture of a workpiece 
W. 

[0016] The image capturing device 20 is used for ob- 

30 taining image data, as described later, and may be a 
CCD camera for obtaining two-dimensional images data 
or a visual sensor capable of obtaining three-dimension- 
al image data including distance data. In the case of us- 
ing the CCD camera, the image data is obtained by a 

35 conventional method based on two-dimensional images 
captured by the CCD camera, but in the case of the vis- 
ual sensor capable of obtaining three-dimensional data 
including distance data, two-dimensional arrangement 
data with distance data between the sensor and an ob- 

40 ject is obtained. A visual sensor for obtaining the three- 
dimensional data including distance data is known, for 
example, from three-dimensional visual sensors of a 
spot light scanning type disclosed in Japanese Patent 
Publication No. 7-270137, and the summary of the 

45 three-dimensional visual sensor is described below. 
[0017] This visual sensor detects a three-dimensional 
position of an object by irradiating a light beam to form 
a light spot on the object for scanning the object in two 
different directions (X direction and Y direction) and by 

50 detecting the light reflected on the object by a position 
sensitive detector (PSD). Three dimensional position of 
the object is measured by a calculation using the re- 
spective inclination angles 6x,0y of mirrors for scanning 
and an incident positions of the reflected light beam on 

55 the PSD. 

[0018] Referring to FIGS. 7-g, a method of obtaining 
two-dimensional arrangement data including distance 
data using the three-dimensional visual sensor will be 
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explained briefly. 

[0019] Scanning range (measuring range) on an ob- 
ject is set in advance, and an inclination angle Gx, 8y of 
the mirrors is controlled discretely. As shown in FIG. 7, 
the scanning is performed from a point (1 ,1) to a point 
(1, n), from a point (2,1) to a point (2, n), from a point 
(m, 1 ) to a point (m, n) on the X-Y plane within the scan- 
ning range, to measure three-dimensional positions of 
each reflected point on the object. Also, a distance Z (i : 
j) between the sensor and the reflection point (i, j) on 
the object is obtained and stored in the RAM38 of the 
image processing apparatus 30. Thus, the image data 
is obtained as two-dimensional arrangement data in- 
cluding the distance data Z (i, j) between the sensor and 
the reflection point on the object, as shown in FIG. S. 
[0020] FIG. 9 is a flowchart of processing to be exe- 
cuted by the processor 31 of the image processing ap- 
paratus 30 for obtaining the image data. 
[0021] First, indexes i and j are respectively set to M 1 " 
(Step 300) and the inclination angle (8x, 6y) of Ihe mir- 
rors is set to (x1 , yl) to direct to the start point (1,1) and 
an irradiation command with the inclination angle is 
send to the sensor 20 (Steps 301-303). The sensor ir- 
radiates a light beam with the mirrors set at the inclina- 
tion angle. The signal representing the image captured 
by the PSD is sent to the image processing apparatus 
30. The processor 31 of the image processing appara- 
tus 30 calculates the position of the reflection point on 
the object of the signal from the PSD and the inclination 
angle (ftx, Gy) of the mirrors to obtain the distance Z (i, 
j) between the sensor and the position of the reflection 
point on the object. This value Z (i, j) is stored in the 
RAM 38 as the two-dimensional arrangement data [i, j] 
(Step 304, 305). The calculation for obtaining the posi- 
tion of the reflection point and the distance Z (i, j) may 
be performed by the sensor 20. 

[0022] Then, the index i is incrementally increased by 
"1" and the inclination angle 8x of the mirror for X-axis 
direction scanning is increased by the predetermined 
amount Ax (Step 306, 307). It is determined whether or 
not the index i exceeds the set value n (Step 308). If the 
index i does not exceed the set value n, the procedure 
returns to Step 303 and the processing from Step 303 
to Step 308 is executed to obtain the distance Z (i, j) of 
the next point. Subsequently, the processing of Steps 
303-308 are repeatedly executed until the index i ex- 
ceeds the set value n to obtain and store the distance Z 
(i, j) of the respective points (1 , 1 ) to (1 , n) shown in FIG. 
7. 

[0023] If it is determined that the index i exceeds the 
set value n in Step 308 : the index i is set to "1 " and the 
index j is incrementally increased by "1" to increase the 
inclination angle 9y of the mirror for Y-axis direction 
scanning (Steps 309-311 ). Then, it is determined wheth- 
er or not the index j exceeds the set value m (Step 31 2) 
and if the index j does not exceed the set value m, the 
procedure returns to Step 302 to repeatedly executes 
the processing of Step 302 and the subsequent Steps. 



[0024] Thus, the processing from Step 302 to Step 
31 2 is repeatedly executed until the index j exceeds the 
set value m. If the index j exceeds the set value m, the 
points in the measurement range (scanning range) 
5 shown in FIG. 7 have been measured entirely, the dis- 
tance data Z (1 , 1 ) - Z (m, n) as two dimensional arrange- 
ment data are stored in the RAM28 and the image data 
obtaining processing is terminated. A part of the image 
data of two dimensional arrangements or a plurality of 
io distance data can be obtained by appropriately omitting 
the measurement of the distance for the index i. 
[0025] The foregoing is a description on the process- 
ing for obtaining two dimensional arrangement data as 
image data using the visual sensor capable of measur- 
es |ng the distance. Using the two-dimensional arrange- 
ment data obtained in this way as image data, creation 
of reference models and detection of position and pos- 
ture (orientation) of an object can be performed. In order 
to simplify the explanation, the following description will 
20 be made assuming thai a CCD camera 20 is used as an 
image capturing device and the two dimensional image 
data obtained by capturing image of the object by this 
camera 20 is used. 

[0026] Processing for creating reference models will 
25 be explained referring to FIGS. 2a-2d and FIG. 5. FIG. 
5 is a flowchart showing processing for teaching refer- 
ence models to the image processing apparatus 30 ac- 
cording to the present invention. 

[0027] One reference workpiece (one of the workpiec- 
30 es W as object for robot operation or a workpiece having 
a three-dimensional shape identical to that of the work- 
piece W) is prepared for creating reference models. A 
first (0-th) position/posture of the reference workpiece 
at which the camera 20 attached to a distal end of a robot 
35 wrist captures the image of the object is set, and an axis 
of rotation and rotation angles with respect to the first 
(0-th) position/posture are set in order to determine the 
subsequent positions/postures of the reference work- 
piece. In addition, the number of positions/postures of 
40 the workpiece at which the camera 20 captures the im- 
age of the object is set. In this example, information of 
both position and posture is used. However it is suffi- 
cient for creating reference models to use only posture 
(orientation) information if the demanded precision of 
45 position is not high. 

[0028] As shown in FIGS. 2a to 2d, in this example, 
images of the reference workpiece are captured from 
four different directions and reference models are cre- 
ated based on the four image data. As shown in FIG. 
so 2a, an image of the reference workpiece is captured 
from the direction of a Z-axis of a world coordinate sys- 
tem at 0-th position/posture to create 0-th reference 
model. For setting the subsequent positions/postures, 
an axis perpendicular to an optical axis of the camera 
S5 and passing a central point of the workpiece (origin of a 
work coordinate system set to the workpiece) and rota- 
tion angles of the workpiece along the rotation axis are 
set for this camera position. Since the optical axis of the 
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camera is set parallel to the Z axis of the world coordi- 
nate system, an axis parallel to either the X-axis or the 
Y-axis of the world coordinate system, which is perpen- 
dicular to the Z axis, can be selected and the workpiece 
is rotated around the rotation axis at the workpiece po- 
sition. 

[0029] In the example, an axis parallel to the X-axis 
of the world coordinate system is set as the rotation axis, 
and for the position/posture shown in FIG. 2b, the rota- 
tion angle of 30° is set to rotate the workpiece by 30° 
with respect to the camera along the rotation axis. A first 
reference model is created based on the image data of 
the workpiece at the position/posture shown in FIG. 2b. 
Similarly, as shown in FIGS. 2c and 2d, the workpiece 
is rotated by 60° and 90°, respectively, along the rotation 
axis for capturing images of the workpiece to create 2nd 
and 3rd reference models. 

[0030] In this example, rotation angles of zero degree, 
30 degrees, 60 degrees and 90 degrees are set for cre- 
ating four reference models. The dividing range of the 
rotation angles may be set more finely and/or range of 
the rotation angle may be set greater to create more ref- 
erence models for more precise detection of the posi- 
tion/posture of the workpiece. 

[0031] The processing for creating the four reference 
models will be explained referring to flowchart of FIG. 5. 
[0032] As described above, the 0-th position/posture 
of the robot at which the camera 20 captures the image 
of the object, and the rotation axis and the rotation an- 
gles with respect to the 0-th position/posture are set in 
advance in order to determine the subsequent positions/ 
postures of the reference workpiece, and also the 
number of the subsequent positions/postures of the 
workpiece are set. For intelligible explanation, it is as- 
sumed that an optical axis of the camera is parallel to 
the Y-axis of the world coordinate system and that a po- 
sition where the X-axis and Y-axis coordinate values are 
identical to those of the reference workpiece and only 
the Z-axis coordinate value is different from that of the 
position of the reference workpiece is taught to the robot 
as the 0-th image capturing position for obtaining the 
0-th reference model. Further, the positions of the robot 
where the camera is rotated with respect to the refer- 
ence workpiece by 30 degrees, 60 degrees and 90 de- 
grees along the axis passing the central point of the ref- 
erence workpiece and parallel to the X-axis of the world 
coordinate system are set as the 1 st, 2nd and 3rd image 
capturing position, and the number N of the image cap- 
turing positions is set to "4." 

[0033] When a command for creating reference mod- 
els is inputted from teaching operation panel 4, the proc- 
essor 1 of the robot controller 1 0 sets a counter M for 
counting the number of the image capturing to D 0 8 (Step 
100). The robot is operated to have the M-th position/ 
posture and a command for image capturing is output- 
ted to the image processing apparatus 30 (Step 101). 
In response to this command, the image processing ap- 
paratus 30 performs capturing of an image of the refer- 



ence workpiece with the camera 20 and the captured 
image data is stored in the frame memory 36. Further, 
relative position/posture of the workpiece with respect 
to the camera is obtained and stored in the nonvolatile 

5 memory 37 as relative position/posture of M-th refer- 
ence model, and a data-captured signal is sent to a robot 
controller (Step 103). Thus, position/posture of the 
workpiece in a camera coordinate system set to the 
camera is obtained from the position/posture of the cam- 

to era and the position/posture of the reference workpiece 
in the world coordinate system when capturing the im- 
age by the camera, and is stored as the relative position/ 
posture of the workpiece with respect to the camera. For 
example, the position/posture of the workpiece in the 

is camera coordinate system is stored as [xO, yO, z0, aO, 
|30, t0]c, where a, |5 and y mean rotation angle around 
X-, Y-, Z- axes, and "c" means the camera coordinate 
system. 

[0034] Upon receipt of the data-captured signal, the 
20 processor 1 of the robot controller 1 0 incrementally in- 
creases the value of the.counter M by " 1 "(Step 104) 
and determines whether or not the value of the counter 
M is less than a set value N (=4) (Step 1 05). If the value 
of the counter M is less than the set value N. the proce- 
ss dure returns to Step 101 to move the robot to the M-th 
image-capturing position/posture. Thus, in the example 
as shown in FIGS. 2a-2d, the camera is successively 
turned by 30 degrees around the axis parallel to X axis 
of the world coordinate system and passing the work- 
30 piece position, and successively captures the image of 
the workpiece, and reference models and relativity po- 
sitions/postures of the camera with respect to the work- 
piece at the image capturing are stored. 
[0035] Processing of Steps 1 01 -1 05 is repeatedly ex- 
35 ecuted until the value of the counter M equals to the set 
value N (=4), and the reference models and the relative 
positions/postures of the camera and the workpiece are 
stored in the nonvolatile memory 37. Thus, the refer- 
ence models created from the image data of the work- 
40 piece at the positions/postures shown in FIGS. 2a -2d 
are stored, and the relative positions/postures between 
the camera and the workpiece for respective reference 
models are stored as positions/postures of the work- 
piece W in the camera coordinate system as [x 0, y 0, z 
45 o, ccO, po, 70]c,[x 1, y 1, z 1, al, (51, y1]c, [x 2, y 2, z 2, 
a2, P2, t2]c, and [x 3, y 3, z 3, a3, £3, v3]c. 
[0036] The reference models and the relative posi- 
tion/posture of the workpiece W and the camera 20 are 
stored in the nonvolatile memory 37 of the image 
so processing apparatus 30. In the above described em- 
bodiment, the reference models are created using a ro- 
bot, however, the reference models may be created by 
a manual operation without using a robot. In this case, 
the reference workpiece is arranged within the field of 
ss view of the camera connected to the image processing 
apparatus 30, and the images of the workpiece with dif- 
ferent postures are captured by the camera. The refer- 
ence models are created based on the image data and 
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the relative positions/postures of the camera and the 
workpiece at the image capturing manually inputted, 
and are stored with the respective relative positions/ 
postures. 

[0037] The reference models may be created from a 
part of the image data of the reference object, and may 
be created by processing the image data of the refer- 
ence object. 

[0038] In addition, the reference models may be cre- 
ated based on the stored image data of the reference 
workpiece when detecting the position/posture of the 
objective workpiece, without creating and storing the 
reference models in advance. 

[0039] Hereinafter, a picking-up operation for taking 
out an individual workpiece by a robot from a pile of 
workpieces each having a shape identical to that of the 
reference workpiece will be described, as an example 
of a method of detecting three-dimensional position/ 
posture of an object, using the image processing appa- 
ratus 30 storing the reference models. 
[0040] FIG. 6 is a flowchart of the carrying out of the 
picking-up operation. When a picking-up command is 
inputted into the robot controller 10 from the teaching 
operation panel 4, the processor 1 operates the robot 
RB to move the camera attached to the robot wrist to an 
image capturing position where a pile of workpieces is 
within a field of view of the camera 20 (Step 200). The 
three dimensional position/posture of the camera 20 on 
the world coordinate system at this image capturing po- 
sition is outputted to the image processing apparatus 
30, and an image capturing command is outputted (Step 
201 ). Upon receipt of the image capturing command, the 
processor 31 of the image processing apparatus 30 cap- 
tures an image of the pile of the workpieces W. to obtain 
image data of some workpieces W and store the data in 
the frame memory 36 (Step 202). 
[0041] Then, pattern matching processing is per- 
formed for the image data stored in the frame memory 
36 using one of the reference models (the first reference 
model) stored in the nonvolatile memory 37 so as to de- 
tect a workpiece W (Step 203). In this pattern matching 
processing, matching of the image data of the reference 
model with the image data of workpieces is performed 
on the basis of position, turn and scale. It is determined 
whether or not an object has a matching value equal or 
greater than the set value (Step 204). If an object having 
a matching value equal or greater than the set value is 
not detected, the procedure proceeds to Step 205 to de- 
termine whether or not the pattern matching is per- 
formed using all the reference models (1st to 4th refer- 
ence models). If the pattern matching using all the ref- 
erence models is not yet performed, further pattern 
matching is performed using another reference model 
(Step 206). 

[0042] If it is determined in Step 204 that an object 
having a matching value equal or greater than the set 
value with respect to any of the reference models is de- 
tected, the procedure proceeds to Step 207 to perform 



matching processing on the two-dimensional data of the 
detected workpieces W using every taught mode. In 
Step 208, the reference model having the largest match- 
ing value in the pattern matching processing is selected, 

5 and the relative position/posture of the workpiece W with 
respect to the camera 20 is determined based on the 
relative position/posture of the camera and the refer- 
ence workpiece stored for the selected reference model, 
and position, rotation angle and scale of the image of 

10 the workpiece in the matching processing, (Step 208). 
The position and posture (orientation) of the detected 
workpiece on the world coordinate system is determined 
from the position and posture of the camera 20 in the 
world coordinate system, which has been sent in Step 

is 201 , and the relative position/posture of the workpiece 
W with respect to the camera 20, and is outputted (Step 
209). Thus, since the relative position/posture of the 
workpiece W with respect to the camera 20 is the posi- 
tion/posture of the workpiece W in the camera coordi- 

20 nate system, the position and posture (orientation) of the 
detected workpiece W In the world coordinate system is 
obtained by an arithmetic operation of coordinate trans- 
formation using the data of the position/posture of the 
workpiece W in the camera coordinate system and the 

25 position/posture of the camera 20 in the world coordi- 
nate system (Step 209). 

[0043] The reference model having the highest 
matching value is selected in this embodiment, although 
a reference model of the rotation angle of zero degree 

30 (the O-th reference model) may be selected in accord- 
ance with precedents, or an object having the highest 
expansion rate of scale (the object which is nearest to 
the camera, i.e. located at the suit of the pile in this ex- 
ample) may be selected in accordance with precedents. 

35 [0044] The robot controller 10 operates the robot to 
perform a picking-up operation to grip and hold the de- 
tected workpiece W and move the held workpiece W to 
a predetermined position, based on the three-dimen- 
sional position/posture of the workpiece W (Step 210). 

40 Then, the procedure returns to Step 202 to repeatedly 
execute the processing of Step 202 and subsequent 
Steps. 

[0045] When all the workpieces have been picked-up 
from the pile of the workpieces, a matching value equal 
45 to or greater than the set reference value cannot be ob- 
tained in the pattern matching processing for all refer- 
ence models in Steps 203-206, and the picking-up op- 
eration is terminated. 

[0046] In the case where a pile of the workpieces can- 
50 not fall within the field of view of the camera 20, or in the 
case where it is not necessary to capture an image of a 
workpiece behind other workpieces by changing the ori- 
entation of the camera, the procedure may return to 
Step 200 when "Yes" is determined in Step 205, to move 
55 the camera to another position/posture at which an im- 
age of the object workpiece can be captured. 
[0047] In addition, in the case where the robot and the 
image processing apparatus 30 are used in combination 
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as in the foregoing embodiment, the robot controller 10 
may store the three-dimensional position/posture of the 
camera without butputting it to the image processing ap- 
paratus 30 in Step 201 and the relative position/posture 
of the workpiece and the camera may be outputted from 
the image processing apparatus 30 to the robot control- 
ler 1 0 in Step 208 to execute the processing of Step 209 
in the robot controller 10. 

[0048] Further, in the case where a wide-angle lens is 
installed in the CCD camera as the image capturing de- 
vice, for example, there is possibility of judging the in- 
clination angle to be 30 degrees by influence of parallax 
when a workpiece of zero degree inclination is at a cor- 
ner of a field of view of the camera. In such a case, the 
camera may be moved parallelly in accordance with the 
position of the workpiece in the field of view of the cam- 
era to a position right above the workpiece so that the 
effect of parallax is lost, and at this position the image 
capturing processing of Step 201 and the subsequent 
Steps in FIG. 6 _j_s performed so that a f atee jud gment, i s 
prevented. 

[0049] Furthermore, in order to obtain three-dimen- 
sional position/posture of an object workpiece whose 
three-dimensional position/posture is unknown without 
using a robot, the camera is arranged to capture an im- 
age of a pile of workpieces or a region containing the 
objective workpiece within a field of view of the camera, 
and the position/posture of the camera in the world co- 
ordinate system is inputted to the image processing ap- 
paratus 30 and an object detection command is issued 
to the image processing apparatus 30, to make the im- 
age processing apparatus 30 execute Steps 202-209 of 
FIG. 6. 

[0050] The image data for creating the reference 
models may be obtained at a place different form the 
place where the robot is installed. In this case, the image 
data may be supplied to the image processing appara- 
tus on line through a communication interface provided 
in the image processing apparatus, or may be supplied 
offline through a disc driver for reading a floppy disk, etc. 
[0051] According to the present invention, the posi- 
tion/posture of an object workpiece in a randomly ar- 
ranged pile of workpieces or an aggregation of work- 
pieces gathered in a predetermined region which have 
identical shapes and different three-dimensional posi- 
tions/postures is detected, to thereby enable a robot to 
automatically pick out an individual workpiece from such 
a pile or an aggregation. 



Claims 



An image processing apparatus for detecting pos- 
ture, or posture and position, of an object compris- 
ing: 

an image capturing device; 

a memory storage reference model created 
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based on image data of a reference object cap- 
tured by said image capturing device in a plu- 
rality of directions, and storing information of 
the capturing directions to be respectively as- 
sociated with said reference models, said ref- 
erence object being the object for detection or 
an object having a shape identical to that of the 
object for detection; and 
a processor to perform matching processing of 
image data containing an image of the object 
for detection captured by said image capturing 
device with image data for said reference mod- 
els to select an image of an object matched with 
one of said reference models, and to obtain 
posture, or posture and position, of the object 
based on the selected image of the object, said 
one reference model and the information of the 
direction associated with said one reference 
model. 

An image processing-apparatus according to claim 
1 , wherein said reference models comprises a part 
of the image data of the reference object. 

An image processing apparatus according to claim 
1 or 2, wherein said reference models are obtained 
by processing the image data of the reference ob- 
ject. 

An image processing apparatus according to any 
preceding claim, wherein said image capturing de- 
vice comprises a camera for capturing two-dimen- 
sional image data. 

An image processing apparatus according to claim 
4, wherein said image data of the reference object 
are captured by said image capturing device from 
a predetermined distance. 

An image processing apparatus according to any 
one of claims 1 to 3, wherein said image capturing 
device comprises a visual sensor for capturing 
three-dimensional image data. 

An image processing apparatus according to claim 
6, wherein said image data containing an image of 
the object for detection captured by said visual sen- 
sor are two-dimensional arrangement data includ- 
ing distance information from the object of detection 
to the image capturing device, a part of said two- 
dimensional arrangement data or a set of distance 
data. 

An image processing apparatus according to an 
one of claims 1 through 7, wherein said image cap- 
turing device is attached to a robot. 



9. An image processing apparatus according to claim 
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1, wherein said image data of the reference object 
are captured in a place different from a place where 
the detection of the object is performed, and sup- 
plied to the image processing apparatus on line or 
off line. 
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